Intro
We talk often about player tendencies to achieve certain outcomes… a
hitter’s spray chart, a hitter’s launch angle breakdown, a pitcher’s
pitch distribution, a pitcher’s fly ball to ground ball ratio, etc.
However, there is a common event that often gets overlooked in baseball
analysis: the foul ball. Every now and then, a stat about foul balls
will emerge, such as how Joey
Votto pulled one foul ball into the seats over the first 2,138 plate
appearances of his career, but more attention is lent to the balls
that are put in play, watched or whiffed at.
ggplot(
dbGetQuery(bq, 'SELECT
description,
count / total `%`
FROM
(
SELECT
1 foo,
description,
COUNT(*) count
FROM
(
SELECT
REGEXP_REPLACE(description, "_*blocked_*", "") description,
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R"
)
GROUP BY
description
)
JOIN
(
SELECT
1 foo,
COUNT(*) total
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R"
)
USING
(foo);'),
aes(reorder(description, -`%`), `%`)
) +
geom_bar(stat = "identity", fill = "#0099f9") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "2023 Pitch Results", x = "", y = "Frequency") +
scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
geom_text(aes(label = percent(`%`, accuracy = 0.1)), vjust = -0.5, size = 3)

Foul balls make up nearly 18% of pitch outcomes and can be
interpreted in a number of different ways. With less than two strikes… *
the pitch wasn’t really what the batter was looking for, but he swung
anyway. * the batter’s swing or timing was slightly off. With two
strikes… * the batter is just trying to stay alive with two strikes. *
the pitch wasn’t located well enough or wasn’t deceptive enough to miss
the bat entirely.
Fouls can feel like a neutral outcome when they happen on the field,
but they are almost always a positive outcome for either the pitcher or
the hitter. For example, with less than two strikes, a foul ball is a
positive outcome for the pitcher. Certain foul balls, like a towering
drive that hooks foul, can indicate that the batter has the upper hand,
but it still counts like any other strike, and the pitcher can breath a
sigh of relief. When there are two strikes, a foul ball is always going
to feel like the batter held strong, and the pitcher is annoyed that he
has to throw another pitch. In short, a foul ball is a win for the
pitcher with less than 2 strikes, and it is a win for the hitter with 2
strikes. With this in mind, I wanted to split hitter and pitcher foul
ball tendencies based on the count.
batters.2023.df <- statcast_leaderboard$batter %>%
filter(pa >= 200) %>%
mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
-est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
rename(PA = pa) %>%
# All pitches
inner_join(
dbGetQuery(bq, 'SELECT
batter player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes < 2 AND
description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
batter;'),
by = "player_id"
) %>%
# 2-Strikes
inner_join(
dbGetQuery(bq, 'SELECT
batter player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes = 2 AND
description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
batter;'),
by = "player_id"
) %>%
rename(`0/1-Strike Whiff %` = `0|1-Strike Whiff %`,
`0/1-Strike Foul %` = `0|1-Strike Foul %`)
pitchers.2023.df <- statcast_leaderboard$pitcher %>%
filter(pa >= 200) %>%
mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
-est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
rename(TBF = pa) %>%
# All pitches
inner_join(
dbGetQuery(bq, 'SELECT
pitcher player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes < 2 AND
description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
pitcher;'),
by = "player_id"
) %>%
# 2-Strikes
inner_join(
dbGetQuery(bq, 'SELECT
pitcher player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip", "swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes = 2 AND
description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
pitcher;'),
by = "player_id"
) %>%
rename(`0/1-Strike Whiff %` = `0|1-Strike Whiff %`,
`0/1-Strike Foul %` = `0|1-Strike Foul %`)
2023 Batters (min. 200 PAs)
10 Highest
kable(
batters.2023.df %>%
select(Name, PA, `0/1-Strike Foul %`) %>%
arrange(desc(`0/1-Strike Foul %`)) %>%
mutate(
`0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
0/1-Strike Foul %
|
|
Jake Cronenworth
|
522
|
48.78%
|
|
Isaac Paredes
|
571
|
48.43%
|
|
Bo Naylor
|
230
|
46.54%
|
|
Ozzie Albies
|
660
|
45.89%
|
|
Nathaniel Lowe
|
724
|
45.59%
|
|
Gio Urshela
|
228
|
45.05%
|
|
Zach McKinstry
|
518
|
45.03%
|
|
Max Kepler
|
491
|
44.52%
|
|
Adley Rutschman
|
687
|
44.26%
|
|
Pavin Smith
|
228
|
44.20%
|
kable(
batters.2023.df %>%
select(Name, PA, `2-Strike Foul %`) %>%
arrange(desc(`2-Strike Foul %`)) %>%
mutate(
`2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
2-Strike Foul %
|
|
Yordan Alvarez
|
496
|
46.95%
|
|
Sal Frelick
|
223
|
46.75%
|
|
Geraldo Perdomo
|
495
|
46.02%
|
|
Justin Turner
|
626
|
45.81%
|
|
Ty France
|
665
|
45.59%
|
|
Isiah Kiner-Falefa
|
361
|
45.55%
|
|
Cody Bellinger
|
556
|
45.43%
|
|
Daulton Varsho
|
581
|
45.11%
|
|
Will Smith
|
554
|
44.87%
|
|
Alec Burleson
|
347
|
44.87%
|
10 Lowest
kable(
batters.2023.df %>%
select(Name, PA, `0/1-Strike Foul %`) %>%
arrange(`0/1-Strike Foul %`) %>%
mutate(
`0/1-Strike Foul %` = percent(`0/1-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
0/1-Strike Foul %
|
|
Christian Bethancourt
|
332
|
29.48%
|
|
William Contreras
|
611
|
29.57%
|
|
Eloy Jiménez
|
489
|
29.92%
|
|
Javier Báez
|
547
|
30.26%
|
|
Kevin Kiermaier
|
408
|
30.29%
|
|
Aaron Judge
|
458
|
30.75%
|
|
Jose Siri
|
364
|
30.82%
|
|
Joey Wiemer
|
410
|
30.87%
|
|
Jordan Walker
|
465
|
30.91%
|
|
Luke Raley
|
406
|
31.07%
|
kable(
batters.2023.df %>%
select(Name, PA, `2-Strike Foul %`) %>%
arrange(`2-Strike Foul %`) %>%
mutate(
`2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
2-Strike Foul %
|
|
Jazz Chisholm Jr.
|
383
|
23.53%
|
|
Mark Vientos
|
233
|
25.32%
|
|
Jose Siri
|
364
|
27.57%
|
|
Eloy Jiménez
|
489
|
27.94%
|
|
Harrison Bader
|
344
|
28.64%
|
|
Brett Baty
|
389
|
28.69%
|
|
Francisco Alvarez
|
423
|
28.87%
|
|
Christopher Morel
|
429
|
29.08%
|
|
Mickey Moniak
|
323
|
29.51%
|
|
Paul DeJong
|
400
|
29.64%
|
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
batter.scatter.plot.df <- batters.2023.df %>%
mutate(diff = `2-Strike Foul %` - `0/1-Strike Foul %`,
color = as.factor(
ifelse(diff >= 0.06, "#025189",
ifelse(diff >= 0.03, "#0c9cb4",
ifelse(diff >= 0, "#94c280",
ifelse(diff >= -0.03, "#f1c359",
ifelse(diff >= -0.06, "#d03f2e",
"#982123"))))))) %>%
select(Name, `2-Strike Foul %`, `0/1-Strike Foul %`, color)
ggplotly(
ggplot(
batter.scatter.plot.df,
aes(x = `0/1-Strike Foul %`, y = `2-Strike Foul %`,
text = paste(Name, "\n0/1-Strike Foul %: ",
percent(`0/1-Strike Foul %`, accuracy = 0.1),
"\n2-Strike Foul %: ",
percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
geom_point(aes(color = color)) +
scale_colour_manual(values = levels(batter.scatter.plot.df$color)) +
labs(title = "2023 Batters (min. 200 PAs)") +
scale_x_continuous(labels = percent) +
scale_y_continuous(labels = percent) +
geom_abline(linetype = "dotted") +
theme(legend.position = "none"),
tooltip = "text") %>%
layout(annotations = list(
list(text = "Hover over any point for player details", x = 0.437, y = 0.462,
font = list(size = 10)),
list(text = "y = x", x = 0.484, y = 0.476, font = list(size = 10),
showarrow = FALSE))) %>%
config(displayModeBar = FALSE)
There is a positive correlation between 0/1-Strike Foul % and
2-Strike Foul % (correlation coefficient of 0.42), but substantial
variance exists, too. Take Yordan Alvarez, for example. With 0 or 1
strike, he fouls off only 34.1% of pitches, but that number increases to
46.9% with 2 strikes, which is the highest in all of MLB.
# ggplotly(
ggplot(
dbGetQuery(bq, 'WITH
team_foul_pct
AS
(
SELECT
CASE inning_topbot WHEN "top" THEN away_team ELSE home_team END team,
strikes,
COUNTIF(description = "foul") fouls,
COUNT(*) pitches,
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND
description IN ("foul", "hit_into_play", "swinging_strike", "foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
team,
strikes
)
SELECT
team,
`0|1-Strike Foul %`,
`2-Strike Foul %`
FROM
(
SELECT
team,
SUM(fouls) / SUM(pitches) `0|1-Strike Foul %`
FROM
team_foul_pct
WHERE
strikes < 2
GROUP BY
team
)
JOIN
(
SELECT
team,
SUM(fouls) / SUM(pitches) `2-Strike Foul %`
FROM
team_foul_pct
WHERE
strikes = 2
GROUP BY
team
)
USING
(team);') %>%
rename(`0/1-Strike Foul %` = `0|1-Strike Foul %`) %>%
left_join(fromJSON("https://statsapi.mlb.com/api/v1/teams?lang=en&sportId=1&season=2023")$teams %>%
mutate(logo = paste("https://www.mlbstatic.com/team-logos/", id, ".svg", sep = "")) %>%
select(abbreviation, logo),
by = c("team" = "abbreviation")),
aes(x = `0/1-Strike Foul %`, y = `2-Strike Foul %`,
text = paste(team, "\n0/1-Strike Foul %: ",
percent(`0/1-Strike Foul %`, accuracy = 0.1),
"\n2-Strike Foul %: ",
percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
geom_image(aes(image = logo), size = 0.075, by = "height") +
labs(title = "2023 Teams (Batting)") +
scale_x_continuous(labels = percent) +
scale_y_continuous(labels = percent) +
geom_abline(linetype = "dotted") +
theme(legend.position = "none")#,

# tooltip = "text") %>%
# layout(annotations = list(
# list(text = "Hover over any point for team details", x = 0.377, y = 0.3965,
# font = list(size = 10)),
# list(text = "y = x", x = 0.388, y = 0.39, font = list(size = 10),
# showarrow = FALSE))) %>%
# config(displayModeBar = FALSE)